EN FR
EN FR


Section: New Results

Algorithms and high-performance solvers

Participants : Astrid Casadei, Cécile Dobrynski, Sébastien Fourestier, Damien Genêt, Hervé Guillard [(Pumas)] , Laurent Hascoët [(Tropics)] , Cédric Lachat, Xavier Lacoste, François Pellegrini, Pierre Ramet [Corresponding member] .

Parallel domain decomposition and sparse matrix reordering

Most of the work carried out within the Scotch project (see section  5.7 ) has been carried out in the context of the PhD of Sébastien Fourestier.

The first axis concerns dynamic repartitioning and remapping. A new set of sequential routines has been devised, which offers new features: mapping (including plain partitioning) with fixed vertices, remapping, and remapping with fixed vertices. All of the above developments are about to be released in the major release 6.0 of Scotch . The porting of the remapping algorithms in parallel is being carried out, and will be part of release 6.1 .

A work carried out in the Joint Laboratory for Petascale Computing (JLPC) between INRIA and UIUC resulted in the inclusion of Scotch as a load balancer in the Charm++ parallel environment. A jointly written conference paper has been submitted on this subject. Another potential use of the remapping features of Scotch concerns multi-phase mapping. Experiments are being carried out at UIUC regarding the use of Scotch as a multi-phase mapper for the OpenAtom scientific code.

Parallel mesh adaptation

This research topic deals with the design of efficient and scalable software tools for parallel dynamic remeshing. This is a joint work with Cécile Dobrzynski, in the context of the PhD of Cédric Lachat, funded by a CORDI-S grant managed by the PUMAS team.

PaMPA (see Section  5.11 ) is a middleware library dedicated to the management of distributed meshes.

The software development of PaMPA is going on. The internal data structure for representing meshes has been frozen, and developments are in progress. The first developments aimed at proving the efficiency of the planned API for handling distributed meshes. A simple P1 FEM Laplacian solver has been written over PaMPA by the PUMAS team to demonstrate how to iterate over PaMPA entities (elements and nodes) and access values borne by the entities, so as to perform FEM computations. These features are available in version 0.1 , which has not yet been diffused to other interested parties. Several new potential users are already willing to try out this version, e.g. ONERA.

PaMPA is already used as the data structure manager for two solvers being developed at INRIA: the Plato solver being developed by the PUMAS team, and the Aerosol new generation fluid dynamics solver being developed in the context of the PhD of Damien Genêt. The interaction with these users allows us to refine the interface to match their needs.

This work now focuses on the core of the PhD of Cédric Lachat: interfacing PaMPA with MMG3D to demonstrate the ability of PaMPA to perform parallel mesh adaptation.

High-performance direct solvers on multi-platforms

New supercomputers incorporate many microprocessors which include themselves one or many computational cores. These new architectures induce strongly hierarchical topologies. These are called NUMA architectures. In the context of distributed NUMA architectures, in collaboration with the INRIA RUNTIME team, we study optimization strategies to improve the scheduling of communications, threads and I/O. Sparse direct solvers are a basic building block of many numerical simulation algorithms. We have developed dynamic scheduling designed for NUMA architectures in the PaStiX solver. The data structures of the solver, as well as the patterns of communication have been modified to meet the needs of these architectures and dynamic scheduling. We are also interested in the dynamic adaptation of the computation grain to use efficiently multi-core architectures and shared memory. Experiments on several numerical test cases have been performed to prove the efficiency of the approach on different architectures.

In collaboration with the ICL team from the University of Tennessee, and the RUNTIME team from INRIA, we are evaluating the way to replace the scheduling driver of the PaStiX solver by one of the generic frameworks, DAGuE (see http://icl.cs.utk.edu/dague/overview/ ) or StarPU (see http://runtime.bordeaux.inria.fr/StarPU ), to execute the task graph corresponding to a sparse factorization. This work now focuses on the core of the PhD of Xavier Lacoste, the aim is to study and design algorithms and parallel programming models for implementing direct methods for the solution of sparse linear systems on emerging computer equipped with GPU accelerators. This is a joint work with the MUMPS team and algorithms will have to be adapted in order to exhibit parallelism that can be more suitable for the dynamic scheduling of computational tasks on such heterogeneous architectures.

Hybrid direct-iterative solver based on a Schur complement approach.

In HIPS , we propose several algorithmic variants to solve the Schur complement system that can be adapted to the geometry of the problem: typically some strategies are more suitable for systems coming from a 2D problem discretisation and others for a 3D problem; the choice of the method also depends on the numerical difficulty of the problem. We have a parallel version of HIPS that provides full iterative methods as well as hybrid methods that mixes a direct factorization inside the domain and an iterative method in the Schur complement.

Graphs or meshes partitioners are now able to deal with problems that have more than several billion of unknowns. Solving linear systems is clearly the limiting step to reach this challenge in numerical simulations. During her PhD, Astrid Casadei will have to propose solutions to get an efficient algorithmic coupling of direct and iterative methods that allow a powerful management of whole the levels of parallelism. As a preliminary study, we focus on memory issues to build a Schur complement in our direct solver. During factorization step, memory overhead may occur for two reasons. The first one is due to the fan-in approach, that is to say the local storage of non-local contributions. The second overhead is due to the coupling matrices (between direct part and Schur complement), which remain allocated during the whole computation and are freed only at the end. Our first ideas to reduce memory consumption was to postpone the allocation of each block and, thanks to a right-looking algorithm, a column-block may be freed as soon as it has been treated. However, many blocks may be allocated very quickly, and a solution would be to use a left-looking scheme when dealing with local contributions. Thus, we introduce a mixed version : a right-looking algorithm is used, except for local contributions in the direct part where a left-looking scheme is applied. Some experiments have been performed and first results show that some substantial memory reductions can be achieved.